Everything is executed on single node, average of 10 runs.

Metrics

Some of reported HPCC metrics:

Results

Overall Performance (That is per All MPI Processes)

High Performance LINPACK Floating-Point Performance, MFLOP per Second

High Performance LINPACK Floating-Point Performance, MFLOP per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 800391.75 5234.33
Ookami mvapich2/libsci/intfft/cce 48 794972.85 5294.60
Ookami openmpi/armpl/armpl/gcc 48 3611.72 9.14
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 46535.57 237.15
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 47026.24 141.35
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 45474.51 69.34
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 581802.00 3172.57
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 435367.30 991.71
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 330640.90 2516.47
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 485480.90 1243.16
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 458037.20 2048.63
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 421653.10 3126.48
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 988258.60 20198.84
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 987046.80 14078.96
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 515101.60 10552.39
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 554204.50 62818.10

Fast Fourier Transform (FFTW) Floating-Point Performance, MFLOP per Second

Fast Fourier Transform (FFTW) Floating-Point Performance, MFLOP per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 7522.22 297.01
Ookami mvapich2/libsci/intfft/cce 48 14492.00 505.40
Ookami openmpi/armpl/armpl/gcc 48 294.68 9.62
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 15388.25 189.37
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 16445.68 513.88
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 18292.79 981.23
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 15490.95 340.47
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 16954.09 226.83
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 17948.42 1564.26
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 15247.34 165.05
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 15983.20 422.39
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 17292.98 4492.64
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 43670.26 780.33
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 46666.67 502.63
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 21765.33 1191.43
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 14499.70 304.43

Parallel Matrix Transpose (PTRANS), MByte per Second

Parallel Matrix Transpose (PTRANS), MByte per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 21629.24 1939.27
Ookami mvapich2/libsci/intfft/cce 48 21989.89 1603.33
Ookami openmpi/armpl/armpl/gcc 48 15668.95 1714.61
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 10720.32 326.78
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 9841.70 136.43
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 16847.96 2121.25
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 10654.13 279.48
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 10037.08 191.98
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 18119.24 1325.63
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 10576.29 438.45
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 9855.29 260.41
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 17883.64 1688.42
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 18074.71 446.00
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 13965.83 209.40
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 9624.68 146.03
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 7165.57 173.95

MPI Random Access, MUpdate per Second

MPI Random Access, MUpdate per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 13.63 0.15
Ookami mvapich2/libsci/intfft/cce 48 13.89 0.21
Ookami openmpi/armpl/armpl/gcc 48 22.41 0.21
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 259.84 3.36
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 252.05 4.77
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 173.59 6.85
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 264.26 2.07
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 248.71 5.50
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 177.20 14.11
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 261.14 2.88
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 253.37 3.62
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 176.40 14.10
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 39.27 0.50
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 3.26 0.03
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 40.03 0.99
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 118.39 18.94

Average Double-Precision General Matrix Multiplication (DGEMM) Floating-Point Performance, MFLOP per Second

Average Double-Precision General Matrix Multiplication (DGEMM) Floating-Point Performance, MFLOP per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 955917.84 1106.20
Ookami mvapich2/libsci/intfft/cce 48 958233.60 453.72
Ookami openmpi/armpl/armpl/gcc 48 118226.30 985.28
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 57371.47 66.41
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 61774.21 64.72
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 64949.94 178.61
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 777500.16 2118.86
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 523424.00 4498.68
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 452975.36 3563.49
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 630826.24 3461.07
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 517666.05 1939.81
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 414587.39 3954.54
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 2106614.40 77528.00
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 334587.27 14445.84
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 696314.16 13099.94
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 724319.12 13386.87

Average STREAM ‘Triad’ Memory Bandwidth, MByte per Second

Average STREAM ‘Triad’ Memory Bandwidth, MByte per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 639150.98 1321.16
Ookami mvapich2/libsci/intfft/cce 48 640054.64 990.97
Ookami openmpi/armpl/armpl/gcc 48 611984.67 2283.09
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 157604.12 1781.38
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 170510.52 9350.40
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 173858.07 56420.11
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 158843.60 2085.14
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 167578.30 9121.34
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 154225.42 20570.22
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 157605.30 2097.07
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 169241.35 6080.71
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 147068.76 1834.26
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 149741.62 2748.06
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 315214.94 23818.48
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 89959.69 872.67
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 87356.19 532.06

Average STREAM ‘Add’ Memory Bandwidth, MByte per Second

Average STREAM ‘Add’ Memory Bandwidth, MByte per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 637376.59 2082.42
Ookami mvapich2/libsci/intfft/cce 48 638134.27 988.38
Ookami openmpi/armpl/armpl/gcc 48 612510.84 2623.65
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 151023.98 1806.49
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 169686.86 3973.33
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 179711.27 68912.78
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 150678.80 1669.13
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 173855.87 6920.44
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 169258.83 32883.04
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 152415.90 7146.12
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 170028.17 8595.58
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 155504.82 2356.32
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 149137.05 2318.96
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 316756.66 21596.90
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 89668.49 981.04
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 86956.61 489.94

Average STREAM ‘Copy’ Memory Bandwidth, MByte per Second

Average STREAM ‘Copy’ Memory Bandwidth, MByte per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 559581.5 2452.81
Ookami mvapich2/libsci/intfft/cce 48 560504.3 1715.93
Ookami openmpi/armpl/armpl/gcc 48 551217.3 11364.29
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 129057.8 5813.76
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 143716.5 5157.72
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 163230.1 65730.92
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 132548.7 6335.55
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 137553.0 1679.58
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 168196.8 37854.83
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 133553.3 6625.87
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 139392.5 1139.72
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 146985.1 12762.55
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 132971.1 2353.97
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 256390.0 7269.67
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 95779.5 306.79
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 100068.5 385.94

Average STREAM ‘Scale’ Memory Bandwidth, MByte per Second

Average STREAM ‘Scale’ Memory Bandwidth, MByte per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 575593.27 1399.44
Ookami mvapich2/libsci/intfft/cce 48 574900.72 2337.44
Ookami openmpi/armpl/armpl/gcc 48 558423.00 1868.18
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 145269.72 3584.16
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 151182.25 4360.15
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 152769.40 50916.58
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 143432.81 923.46
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 149246.44 4300.89
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 144922.80 24866.73
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 146640.08 2347.68
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 145871.47 6602.79
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 138834.45 2532.77
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 131272.56 2988.52
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 270395.12 20602.43
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 79454.04 361.65
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 77124.61 480.41

Performance per Core (that is MPI Process)

High Performance LINPACK Floating-Point Performance, MFLOP per Second

High Performance LINPACK Floating-Point Performance, MFLOP per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 16674.83 109.05
Ookami mvapich2/libsci/intfft/cce 48 16561.93 110.30
Ookami openmpi/armpl/armpl/gcc 48 75.24 0.19
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 727.12 3.71
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 367.39 1.10
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 177.63 0.27
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 9090.66 49.57
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 3401.31 7.75
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 1291.57 9.83
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 7585.64 19.42
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 3578.42 16.00
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 1647.08 12.21
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 20588.72 420.81
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 14515.39 207.04
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 21462.57 439.68
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 19793.02 2243.50

Fast Fourier Transform (FFTW) Floating-Point Performance, MFLOP per Second

Fast Fourier Transform (FFTW) Floating-Point Performance, MFLOP per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 156.71 6.19
Ookami mvapich2/libsci/intfft/cce 48 301.92 10.53
Ookami openmpi/armpl/armpl/gcc 48 6.14 0.20
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 240.44 2.96
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 128.48 4.01
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 71.46 3.83
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 242.05 5.32
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 132.45 1.77
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 70.11 6.11
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 238.24 2.58
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 124.87 3.30
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 67.55 17.55
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 909.80 16.26
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 686.27 7.39
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 906.89 49.64
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 517.85 10.87

Parallel Matrix Transpose (PTRANS), MByte per Second

Parallel Matrix Transpose (PTRANS), MByte per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 450.61 40.40
Ookami mvapich2/libsci/intfft/cce 48 458.12 33.40
Ookami openmpi/armpl/armpl/gcc 48 326.44 35.72
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 167.50 5.11
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 76.89 1.07
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 65.81 8.29
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 166.47 4.37
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 78.41 1.50
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 70.78 5.18
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 165.25 6.85
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 76.99 2.03
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 69.86 6.60
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 376.56 9.29
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 205.38 3.08
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 401.03 6.08
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 255.91 6.21

MPI Random Access, MUpdate per Second

MPI Random Access, MUpdate per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 0.28 0.00
Ookami mvapich2/libsci/intfft/cce 48 0.29 0.00
Ookami openmpi/armpl/armpl/gcc 48 0.47 0.00
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 4.06 0.05
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 1.97 0.04
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 0.68 0.03
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 4.13 0.03
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 1.94 0.04
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 0.69 0.06
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 4.08 0.04
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 1.98 0.03
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 0.69 0.06
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 0.82 0.01
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 0.05 0.00
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 1.67 0.04
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 4.23 0.68

Average Double-Precision General Matrix Multiplication (DGEMM) Floating-Point Performance, MFLOP per Second

Average Double-Precision General Matrix Multiplication (DGEMM) Floating-Point Performance, MFLOP per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 19914.95 23.05
Ookami mvapich2/libsci/intfft/cce 48 19963.20 9.45
Ookami openmpi/armpl/armpl/gcc 48 2463.05 20.53
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 896.43 1.04
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 482.61 0.51
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 253.71 0.70
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 12148.44 33.11
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 4089.25 35.15
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 1769.43 13.92
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 9856.66 54.08
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 4044.27 15.15
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 1619.48 15.45
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 43887.80 1615.17
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 4920.40 212.44
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 29013.09 545.83
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 25868.54 478.10

Average STREAM ‘Triad’ Memory Bandwidth, MByte per Second

Average STREAM ‘Triad’ Memory Bandwidth, MByte per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 13315.65 27.52
Ookami mvapich2/libsci/intfft/cce 48 13334.47 20.65
Ookami openmpi/armpl/armpl/gcc 48 12749.68 47.56
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 2462.56 27.83
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 1332.11 73.05
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 679.13 220.39
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 2481.93 32.58
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 1309.21 71.26
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 602.44 80.35
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 2462.58 32.77
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 1322.20 47.51
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 574.49 7.17
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 3119.62 57.25
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 4635.51 350.27
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 3748.32 36.36
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 3119.86 19.00

Average STREAM ‘Add’ Memory Bandwidth, MByte per Second

Average STREAM ‘Add’ Memory Bandwidth, MByte per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 13278.68 43.38
Ookami mvapich2/libsci/intfft/cce 48 13294.46 20.59
Ookami openmpi/armpl/armpl/gcc 48 12760.64 54.66
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 2359.75 28.23
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 1325.68 31.04
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 702.00 269.19
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 2354.36 26.08
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 1358.25 54.07
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 661.17 128.45
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 2381.50 111.66
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 1328.35 67.15
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 607.44 9.20
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 3107.02 48.31
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 4658.19 317.60
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 3736.19 40.88
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 3105.59 17.50

Average STREAM ‘Copy’ Memory Bandwidth, MByte per Second

Average STREAM ‘Copy’ Memory Bandwidth, MByte per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 11657.95 51.10
Ookami mvapich2/libsci/intfft/cce 48 11677.17 35.75
Ookami openmpi/armpl/armpl/gcc 48 11483.69 236.76
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 2016.53 90.84
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 1122.79 40.29
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 637.62 256.76
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 2071.07 98.99
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 1074.63 13.12
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 657.02 147.87
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 2086.77 103.53
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 1089.00 8.90
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 574.16 49.85
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 2770.23 49.04
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 3770.44 106.91
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 3990.81 12.78
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 3573.87 13.78

Average STREAM ‘Scale’ Memory Bandwidth, MByte per Second

Average STREAM ‘Scale’ Memory Bandwidth, MByte per Second
resource exe_type cpus average stdev
Ookami mvapich2/libsci/fftw3/cce 48 11991.53 29.15
Ookami mvapich2/libsci/intfft/cce 48 11977.10 48.70
Ookami openmpi/armpl/armpl/gcc 48 11633.81 38.92
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 64 2269.84 56.00
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 128 1181.11 34.06
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/sysblas/intfft 256 596.76 198.89
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 64 2241.14 14.43
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 128 1165.99 33.60
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/openblas/intfft 256 566.10 97.14
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 64 2291.25 36.68
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 128 1139.62 51.58
Koinu (CPU ARM ThunderX2 32(128) cores x2) openmpi/armpl/intfft 256 542.32 9.89
Stampede2 SKX (CPU Intel Xeon Platinum 8160 24(48) cores x2) impi/mkl 48 2734.84 62.26
Stampede2 KNL (CPU Intel Xeon Phi 7250 68(272) cores x2) impi/mkl 68 3976.40 302.98
Comet (CPU Intel Xeon E5-2680v3 12 cores x2, Haswell) mvapich2/mkl 24 3310.58 15.07
Bridges (Intel E5-2695 v3 14 cores x2, Haswell) impi/mkl 28 2754.45 17.16